{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Hybrid Search with FastEmbed & Qdrant\n",
    "\n",
    "Author: [Nirant Kasliwal](https://twitter.com/nirantk)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What will we do?\n",
    "This notebook demonstrates the usage of Hybrid Search with FastEmbed & Qdrant. \n",
    "\n",
    "1. Setup: Download and install the required dependencies\n",
    "2. Preview data: Load and preview the data\n",
    "3. Create Sparse Embeddings: Create SPLADE++ embeddings for the data\n",
    "4. Create Dense Embeddings: Create BGE-Large-en-v1.5 embeddings for the data\n",
    "5. Indexing: Index the embeddings using Qdrant\n",
    "6. Search: Perform Hybrid Search using FastEmbed & Qdrant\n",
    "7. Ranking: Rank the search results with Reciprocal Rank Fusion (RRF)\n",
    "\n",
    "## Setup\n",
    "\n",
    "In order to get started, you need a few dependencies, and we'll install them next:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install -qU qdrant-client fastembed datasets transformers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-03-30T00:46:59.293555Z",
     "start_time": "2024-03-30T00:46:59.285419Z"
    }
   },
   "outputs": [],
   "source": [
    "import json\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "from datasets import load_dataset\n",
    "from qdrant_client import QdrantClient\n",
    "from qdrant_client.models import (\n",
    "    Distance,\n",
    "    NamedSparseVector,\n",
    "    NamedVector,\n",
    "    SparseVector,\n",
    "    PointStruct,\n",
    "    SearchRequest,\n",
    "    SparseIndexParams,\n",
    "    SparseVectorParams,\n",
    "    VectorParams,\n",
    "    ScoredPoint,\n",
    ")\n",
    "from transformers import AutoTokenizer\n",
    "\n",
    "import fastembed\n",
    "from fastembed import SparseEmbedding, SparseTextEmbedding, TextEmbedding"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'0.2.5'"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fastembed.__version__  # 0.2.5"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-03-30T00:47:06.815264Z",
     "start_time": "2024-03-30T00:47:00.149649Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Dataset({\n",
       "    features: ['example_id', 'query', 'query_id', 'product_id', 'product_locale', 'esci_label', 'small_version', 'large_version', 'product_title', 'product_description', 'product_bullet_point', 'product_brand', 'product_color', 'product_text'],\n",
       "    num_rows: 919\n",
       "})"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dataset = load_dataset(\"tasksource/esci\", split=\"train\")\n",
    "# We'll select the first 1000 examples for this demo\n",
    "dataset = dataset.select(range(1000))\n",
    "dataset = dataset.filter(lambda x: x[\"product_locale\"] == \"us\")\n",
    "dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Preview Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-03-30T00:47:06.831216Z",
     "start_time": "2024-03-30T00:47:06.810740Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>example_id</th>\n",
       "      <th>query</th>\n",
       "      <th>query_id</th>\n",
       "      <th>product_id</th>\n",
       "      <th>product_locale</th>\n",
       "      <th>esci_label</th>\n",
       "      <th>small_version</th>\n",
       "      <th>large_version</th>\n",
       "      <th>product_title</th>\n",
       "      <th>product_description</th>\n",
       "      <th>product_bullet_point</th>\n",
       "      <th>product_brand</th>\n",
       "      <th>product_color</th>\n",
       "      <th>product_text</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>revent 80 cfm</td>\n",
       "      <td>0</td>\n",
       "      <td>B000MOO21W</td>\n",
       "      <td>us</td>\n",
       "      <td>Irrelevant</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...</td>\n",
       "      <td>None</td>\n",
       "      <td>WhisperCeiling fans feature a totally enclosed...</td>\n",
       "      <td>Panasonic</td>\n",
       "      <td>White</td>\n",
       "      <td>Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>revent 80 cfm</td>\n",
       "      <td>0</td>\n",
       "      <td>B07X3Y6B1V</td>\n",
       "      <td>us</td>\n",
       "      <td>Exact</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>Homewerks 7141-80 Bathroom Fan Integrated LED ...</td>\n",
       "      <td>None</td>\n",
       "      <td>OUTSTANDING PERFORMANCE: This Homewerk's bath ...</td>\n",
       "      <td>Homewerks</td>\n",
       "      <td>80 CFM</td>\n",
       "      <td>Homewerks 7141-80 Bathroom Fan Integrated LED ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2</td>\n",
       "      <td>revent 80 cfm</td>\n",
       "      <td>0</td>\n",
       "      <td>B07WDM7MQQ</td>\n",
       "      <td>us</td>\n",
       "      <td>Exact</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>Homewerks 7140-80 Bathroom Fan Ceiling Mount E...</td>\n",
       "      <td>None</td>\n",
       "      <td>OUTSTANDING PERFORMANCE: This Homewerk's bath ...</td>\n",
       "      <td>Homewerks</td>\n",
       "      <td>White</td>\n",
       "      <td>Homewerks 7140-80 Bathroom Fan Ceiling Mount E...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>3</td>\n",
       "      <td>revent 80 cfm</td>\n",
       "      <td>0</td>\n",
       "      <td>B07RH6Z8KW</td>\n",
       "      <td>us</td>\n",
       "      <td>Exact</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>Delta Electronics RAD80L BreezRadiance 80 CFM ...</td>\n",
       "      <td>This pre-owned or refurbished product has been...</td>\n",
       "      <td>Quiet operation at 1.5 sones\\nBuilt-in thermos...</td>\n",
       "      <td>DELTA ELECTRONICS (AMERICAS) LTD.</td>\n",
       "      <td>White</td>\n",
       "      <td>Delta Electronics RAD80L BreezRadiance 80 CFM ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>4</td>\n",
       "      <td>revent 80 cfm</td>\n",
       "      <td>0</td>\n",
       "      <td>B07QJ7WYFQ</td>\n",
       "      <td>us</td>\n",
       "      <td>Exact</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>Panasonic FV-08VRE2 Ventilation Fan with Reces...</td>\n",
       "      <td>None</td>\n",
       "      <td>The design solution for Fan/light combinations...</td>\n",
       "      <td>Panasonic</td>\n",
       "      <td>White</td>\n",
       "      <td>Panasonic FV-08VRE2 Ventilation Fan with Reces...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   example_id           query  query_id  product_id product_locale  \\\n",
       "0           0   revent 80 cfm         0  B000MOO21W             us   \n",
       "2           1   revent 80 cfm         0  B07X3Y6B1V             us   \n",
       "3           2   revent 80 cfm         0  B07WDM7MQQ             us   \n",
       "4           3   revent 80 cfm         0  B07RH6Z8KW             us   \n",
       "5           4   revent 80 cfm         0  B07QJ7WYFQ             us   \n",
       "\n",
       "   esci_label  small_version  large_version  \\\n",
       "0  Irrelevant              0              1   \n",
       "2       Exact              0              1   \n",
       "3       Exact              0              1   \n",
       "4       Exact              0              1   \n",
       "5       Exact              0              1   \n",
       "\n",
       "                                       product_title  \\\n",
       "0  Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...   \n",
       "2  Homewerks 7141-80 Bathroom Fan Integrated LED ...   \n",
       "3  Homewerks 7140-80 Bathroom Fan Ceiling Mount E...   \n",
       "4  Delta Electronics RAD80L BreezRadiance 80 CFM ...   \n",
       "5  Panasonic FV-08VRE2 Ventilation Fan with Reces...   \n",
       "\n",
       "                                 product_description  \\\n",
       "0                                               None   \n",
       "2                                               None   \n",
       "3                                               None   \n",
       "4  This pre-owned or refurbished product has been...   \n",
       "5                                               None   \n",
       "\n",
       "                                product_bullet_point  \\\n",
       "0  WhisperCeiling fans feature a totally enclosed...   \n",
       "2  OUTSTANDING PERFORMANCE: This Homewerk's bath ...   \n",
       "3  OUTSTANDING PERFORMANCE: This Homewerk's bath ...   \n",
       "4  Quiet operation at 1.5 sones\\nBuilt-in thermos...   \n",
       "5  The design solution for Fan/light combinations...   \n",
       "\n",
       "                       product_brand product_color  \\\n",
       "0                          Panasonic         White   \n",
       "2                          Homewerks        80 CFM   \n",
       "3                          Homewerks         White   \n",
       "4  DELTA ELECTRONICS (AMERICAS) LTD.         White   \n",
       "5                          Panasonic         White   \n",
       "\n",
       "                                        product_text  \n",
       "0  Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...  \n",
       "2  Homewerks 7141-80 Bathroom Fan Integrated LED ...  \n",
       "3  Homewerks 7140-80 Bathroom Fan Ceiling Mount E...  \n",
       "4  Delta Electronics RAD80L BreezRadiance 80 CFM ...  \n",
       "5  Panasonic FV-08VRE2 Ventilation Fan with Reces...  "
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "source_df = dataset.to_pandas()\n",
    "df = source_df.drop_duplicates(\n",
    "    subset=[\"product_text\", \"product_title\", \"product_bullet_point\", \"product_brand\"]\n",
    ")\n",
    "df = df.dropna(subset=[\"product_text\", \"product_title\", \"product_bullet_point\", \"product_brand\"])\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-03-30T00:47:06.842492Z",
     "start_time": "2024-03-30T00:47:06.831564Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Catalog Item Count: 176\n",
      "Queries: 919\n"
     ]
    }
   ],
   "source": [
    "print(f\"Catalog Item Count: {len(df)}\\nQueries: {len(source_df)}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-03-30T00:47:06.843214Z",
     "start_time": "2024-03-30T00:47:06.835501Z"
    }
   },
   "outputs": [],
   "source": [
    "df[\"combined_text\"] = (\n",
    "    df[\"product_title\"] + \"\\n\" + df[\"product_text\"] + \"\\n\" + df[\"product_bullet_point\"]\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-03-30T00:47:06.843675Z",
     "start_time": "2024-03-30T00:47:06.837385Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "176"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create Sparse Embeddings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-03-30T00:47:08.983795Z",
     "start_time": "2024-03-30T00:47:06.839334Z"
    }
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "930a97b272324022a4ce1a2ff7637c53",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Fetching 9 files:   0%|          | 0/9 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "74778cee8094426da10c2e2bb780daa0",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Fetching 7 files:   0%|          | 0/7 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "sparse_model_name = \"prithvida/Splade_PP_en_v1\"\n",
    "dense_model_name = \"BAAI/bge-large-en-v1.5\"\n",
    "# This triggers the model download\n",
    "sparse_model = SparseTextEmbedding(model_name=sparse_model_name, batch_size=32)\n",
    "dense_model = TextEmbedding(model_name=dense_model_name, batch_size=32)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-03-30T00:47:09.018037Z",
     "start_time": "2024-03-30T00:47:08.996912Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[SparseEmbedding(values=array([0.17295611, 0.80484658, 0.41356239, 0.38512513, 0.90825951,\n",
       "        0.61373132, 0.18313883, 0.18546289, 0.04257777, 1.20476401,\n",
       "        1.48403311, 0.17008089, 0.06487759, 0.16780744, 0.23214206,\n",
       "        2.5722568 , 1.87174428, 0.2541669 , 0.20749982, 0.16315481,\n",
       "        0.70712435, 0.26381177, 0.49152234, 0.67282563, 0.19267203,\n",
       "        0.29127747, 0.09682107, 1.21251154, 0.19741221, 0.44512141,\n",
       "        0.44369081, 0.21676107, 0.36704862, 0.06706504, 1.97674787,\n",
       "        0.00856015, 0.51626593, 0.21145488, 0.09790635, 0.26357391,\n",
       "        1.6925323 , 2.10766435, 0.05584541, 0.05150893, 0.24062614,\n",
       "        0.90479541, 0.1198509 , 0.10030396]), indices=array([ 1037,  2003,  2005,  2190,  2204,  2307,  2338,  2497,  2565,\n",
       "         2793,  3075,  3177,  3274,  3286,  3430,  3435,  3793,  3819,\n",
       "         3835,  3989,  4007,  4118,  4248,  4289,  4294,  4322,  4434,\n",
       "         4667,  4773,  5080,  5371,  5527,  6028,  6581,  6633,  6919,\n",
       "         6981,  6994,  7809,  7812,  7861,  8270,  9262,  9896, 10472,\n",
       "        13850, 16602, 23924]))]"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def make_sparse_embedding(texts: list[str]) -> list[SparseEmbedding]:\n",
    "    return list(sparse_model.embed(texts, batch_size=32))\n",
    "\n",
    "\n",
    "sparse_embedding: list[SparseEmbedding] = make_sparse_embedding(\n",
    "    [\"Fastembed is a great library for text embeddings!\"]\n",
    ")\n",
    "sparse_embedding"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The previous output is a SparseEmbedding object for the first document in our list.\n",
    "\n",
    "It contains two arrays: values and indices. \n",
    "- The 'values' array represents the weights of the features (tokens) in the document.\n",
    "- The 'indices' array represents the indices of these features in the model's vocabulary.\n",
    "\n",
    "Each pair of corresponding values and indices represents a token and its weight in the document."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is still a little abstract, so let's use the tokenizer vocab to make sense of these indices."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-03-30T00:47:12.171596Z",
     "start_time": "2024-03-30T00:47:12.166737Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'model': 'prithvida/Splade_PP_en_v1',\n",
       "  'vocab_size': 30522,\n",
       "  'description': 'Misspelled version of the model. Retained for backward compatibility. Independent Implementation of SPLADE++ Model for English',\n",
       "  'size_in_GB': 0.532,\n",
       "  'sources': {'hf': 'Qdrant/SPLADE_PP_en_v1'}},\n",
       " {'model': 'prithivida/Splade_PP_en_v1',\n",
       "  'vocab_size': 30522,\n",
       "  'description': 'Independent Implementation of SPLADE++ Model for English',\n",
       "  'size_in_GB': 0.532,\n",
       "  'sources': {'hf': 'Qdrant/SPLADE_PP_en_v1'}}]"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "SparseTextEmbedding.list_supported_models()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "    \"fast\": 2.5722568035125732,\n",
      "    \"##bed\": 2.1076643466949463,\n",
      "    \"##em\": 1.9767478704452515,\n",
      "    \"text\": 1.8717442750930786,\n",
      "    \"em\": 1.6925323009490967,\n",
      "    \"library\": 1.4840331077575684,\n",
      "    \"##ding\": 1.2125115394592285,\n",
      "    \"bed\": 1.2047640085220337,\n",
      "    \"good\": 0.9082595109939575,\n",
      "    \"librarian\": 0.9047954082489014,\n",
      "    \"is\": 0.8048465847969055,\n",
      "    \"software\": 0.7071243524551392,\n",
      "    \"format\": 0.6728256344795227,\n",
      "    \"great\": 0.613731324672699,\n",
      "    \"texts\": 0.5162659287452698,\n",
      "    \"quick\": 0.49152234196662903,\n",
      "    \"device\": 0.4451214075088501,\n",
      "    \"file\": 0.44369080662727356,\n",
      "    \"for\": 0.4135623872280121,\n",
      "    \"best\": 0.38512513041496277,\n",
      "    \"technique\": 0.36704862117767334,\n",
      "    \"facility\": 0.2912774682044983,\n",
      "    \"method\": 0.26381176710128784,\n",
      "    \"ideal\": 0.26357391476631165,\n",
      "    \"perfect\": 0.2541669011116028,\n",
      "    \"##bing\": 0.24062614142894745,\n",
      "    \"material\": 0.23214206099510193,\n",
      "    \"storage\": 0.21676106750965118,\n",
      "    \"tool\": 0.21145488321781158,\n",
      "    \"nice\": 0.20749981701374054,\n",
      "    \"web\": 0.19741220772266388,\n",
      "    \"architecture\": 0.1926720291376114,\n",
      "    \"##b\": 0.18546289205551147,\n",
      "    \"book\": 0.18313883244991302,\n",
      "    \"a\": 0.17295610904693604,\n",
      "    \"speed\": 0.17008088529109955,\n",
      "    \"##am\": 0.1678074449300766,\n",
      "    \"##ization\": 0.16315481066703796,\n",
      "    \"browser\": 0.11985089629888535,\n",
      "    \"##ogen\": 0.10030396282672882,\n",
      "    \"database\": 0.09790635108947754,\n",
      "    \"connection\": 0.09682106971740723,\n",
      "    \"excellent\": 0.0670650377869606,\n",
      "    \"computer\": 0.06487759202718735,\n",
      "    \"java\": 0.055845409631729126,\n",
      "    \"algorithm\": 0.051508933305740356,\n",
      "    \"program\": 0.04257776960730553,\n",
      "    \"wonderful\": 0.00856015458703041\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "def get_tokens_and_weights(sparse_embedding, model_name) -> dict[str, float]:\n",
    "    # Find the tokenizer for the model\n",
    "    tokenizer_source = None\n",
    "    for model_info in SparseTextEmbedding.list_supported_models():\n",
    "        if model_info[\"model\"].lower() == model_name.lower():\n",
    "            tokenizer_source = model_info[\"sources\"][\"hf\"]\n",
    "            break\n",
    "        else:\n",
    "            raise ValueError(f\"Model {model_name} not found in the supported models.\")\n",
    "\n",
    "    tokenizer = AutoTokenizer.from_pretrained(tokenizer_source)\n",
    "    token_weight_dict: dict[str, float] = {}\n",
    "    for i in range(len(sparse_embedding.indices)):\n",
    "        token = tokenizer.decode([sparse_embedding.indices[i]])\n",
    "        weight = sparse_embedding.values[i]\n",
    "        token_weight_dict[token] = weight\n",
    "\n",
    "    # Sort the dictionary by weights\n",
    "    token_weight_dict = dict(\n",
    "        sorted(token_weight_dict.items(), key=lambda item: item[1], reverse=True)\n",
    "    )\n",
    "    return token_weight_dict\n",
    "\n",
    "\n",
    "# Test the function with the first SparseEmbedding\n",
    "print(json.dumps(get_tokens_and_weights(sparse_embedding[0], sparse_model_name), indent=4))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create Dense Embeddings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-03-30T00:48:11.018700Z",
     "start_time": "2024-03-30T00:48:10.975766Z"
    }
   },
   "outputs": [],
   "source": [
    "def make_dense_embedding(texts: list[str]):\n",
    "    return list(dense_model.embed(texts))\n",
    "\n",
    "\n",
    "dense_embedding = make_dense_embedding([\"Fastembed is a great library for text embeddings!\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-03-30T00:48:12.418869Z",
     "start_time": "2024-03-30T00:48:12.413593Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(1024,)"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dense_embedding[0].shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-03-30T00:48:13.789029Z",
     "start_time": "2024-03-30T00:48:13.777529Z"
    }
   },
   "outputs": [],
   "source": [
    "product_texts = df[\"combined_text\"].tolist()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 5min 57s, sys: 22 s, total: 6min 19s\n",
      "Wall time: 1min 37s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "df[\"sparse_embedding\"] = make_sparse_embedding(product_texts)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice that FastEmbed uses data parallelism to speed up the embedding generation process. \n",
    "\n",
    "This improves throughput and reduces the time it takes to generate embeddings for large datasets. \n",
    "\n",
    "For our small dataset here, on my local machine -- it reduces the time from user's 6 min 15 seconds to a wall time of about 3 min 6 seconds, or about 2x faster. This is a function of the number of CPU cores available on the machine, CPU usage and other factors -- so your mileage may vary."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      SparseEmbedding(values=array([0.06509431, 0.57...\n",
       "2      SparseEmbedding(values=array([0.10595927, 0.20...\n",
       "3      SparseEmbedding(values=array([0.1140037 , 0.02...\n",
       "4      SparseEmbedding(values=array([6.13510251e-01, ...\n",
       "5      SparseEmbedding(values=array([0.90058267, 0.12...\n",
       "                             ...                        \n",
       "780    SparseEmbedding(values=array([5.56782305e-01, ...\n",
       "809    SparseEmbedding(values=array([0.38585788, 0.44...\n",
       "828    SparseEmbedding(values=array([3.27695787e-01, ...\n",
       "867    SparseEmbedding(values=array([0.36255798, 0.74...\n",
       "870    SparseEmbedding(values=array([3.74321818e-01, ...\n",
       "Name: sparse_embedding, Length: 176, dtype: object"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"sparse_embedding\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 15min 51s, sys: 31.7 s, total: 16min 23s\n",
      "Wall time: 3min\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "df[\"dense_embedding\"] = make_dense_embedding(product_texts)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Indexing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-03-30T00:48:19.041998Z",
     "start_time": "2024-03-30T00:48:19.036147Z"
    }
   },
   "outputs": [],
   "source": [
    "client = QdrantClient(\":memory:\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### About Qdrant\n",
    "\n",
    "Qdrant is a vector similarity search engine that allows you to index and search high-dimensional vectors. It supports both sparse and dense embeddings, and it's a great tool for building search engines. \n",
    "\n",
    "Here, we use the memory mode which is Numpy under the hood for demonstration purposes. In production, you can use the Docker or Cloud for full DB support."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-03-30T00:48:23.670884Z",
     "start_time": "2024-03-30T00:48:23.661594Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "collection_name = \"esci\"\n",
    "client.create_collection(\n",
    "    collection_name,\n",
    "    vectors_config={\n",
    "        \"text-dense\": VectorParams(\n",
    "            size=1024,  # OpenAI Embeddings\n",
    "            distance=Distance.COSINE,\n",
    "        )\n",
    "    },\n",
    "    sparse_vectors_config={\n",
    "        \"text-sparse\": SparseVectorParams(\n",
    "            index=SparseIndexParams(\n",
    "                on_disk=False,\n",
    "            )\n",
    "        )\n",
    "    },\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-03-30T00:48:26.252861Z",
     "start_time": "2024-03-30T00:48:25.724035Z"
    }
   },
   "outputs": [],
   "source": [
    "def make_points(df: pd.DataFrame) -> list[PointStruct]:\n",
    "    sparse_vectors = df[\"sparse_embedding\"].tolist()\n",
    "    product_texts = df[\"combined_text\"].tolist()\n",
    "    dense_vectors = df[\"dense_embedding\"].tolist()\n",
    "    rows = df.to_dict(orient=\"records\")\n",
    "    points = []\n",
    "    for idx, (text, sparse_vector, dense_vector) in enumerate(\n",
    "        zip(product_texts, sparse_vectors, dense_vectors)\n",
    "    ):\n",
    "        sparse_vector = SparseVector(\n",
    "            indices=sparse_vector.indices.tolist(), values=sparse_vector.values.tolist()\n",
    "        )\n",
    "        point = PointStruct(\n",
    "            id=idx,\n",
    "            payload={\n",
    "                \"text\": text,\n",
    "                \"product_id\": rows[idx][\"product_id\"],\n",
    "            },  # Add any additional payload if necessary\n",
    "            vector={\n",
    "                \"text-sparse\": sparse_vector,\n",
    "                \"text-dense\": dense_vector.tolist(),\n",
    "            },\n",
    "        )\n",
    "        points.append(point)\n",
    "    return points\n",
    "\n",
    "\n",
    "points: list[PointStruct] = make_points(df)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "UpdateResult(operation_id=0, status=<UpdateStatus.COMPLETED: 'completed'>)"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "client.upsert(collection_name, points)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Search"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-03-30T00:48:48.298878Z",
     "start_time": "2024-03-30T00:48:48.256591Z"
    }
   },
   "outputs": [],
   "source": [
    "def search(query_text: str):\n",
    "    # # Compute sparse and dense vectors\n",
    "    query_sparse_vectors: list[SparseEmbedding] = make_sparse_embedding([query_text])\n",
    "    query_dense_vector: list[np.ndarray] = make_dense_embedding([query_text])\n",
    "\n",
    "    search_results = client.search_batch(\n",
    "        collection_name=collection_name,\n",
    "        requests=[\n",
    "            SearchRequest(\n",
    "                vector=NamedVector(\n",
    "                    name=\"text-dense\",\n",
    "                    vector=query_dense_vector[0].tolist(),\n",
    "                ),\n",
    "                limit=10,\n",
    "                with_payload=True,\n",
    "            ),\n",
    "            SearchRequest(\n",
    "                vector=NamedSparseVector(\n",
    "                    name=\"text-sparse\",\n",
    "                    vector=SparseVector(\n",
    "                        indices=query_sparse_vectors[0].indices.tolist(),\n",
    "                        values=query_sparse_vectors[0].values.tolist(),\n",
    "                    ),\n",
    "                ),\n",
    "                limit=10,\n",
    "                with_payload=True,\n",
    "            ),\n",
    "        ],\n",
    "    )\n",
    "\n",
    "    return search_results\n",
    "\n",
    "\n",
    "query_text = \" revent 80 cfm\"\n",
    "search_results = search(query_text)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Ranking\n",
    "\n",
    "We'll combine the results from the two models using Reciprocal Rank Fusion (RRF). You can read more about RRF [here](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf).\n",
    "\n",
    "We select RRF for this task because:\n",
    "1. It is a simple and effective method for combining search results.\n",
    "2. It is robust to the differences in the ranking scores of the two or more ranking lists.\n",
    "3. It is easy to implement and requires minimal tuning (only one parameter: alpha, which we don't tune here)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-03-30T00:48:53.075137Z",
     "start_time": "2024-03-30T00:48:53.059828Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[('A', 0.033465871107430434),\n",
       " ('B', 0.033465871107430434),\n",
       " ('D', 0.03320985472238179),\n",
       " ('C', 0.03294544435749548),\n",
       " ('E', 0.01775980832584606)]"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def rrf(rank_lists, alpha=60, default_rank=1000):\n",
    "    \"\"\"\n",
    "    Optimized Reciprocal Rank Fusion (RRF) using NumPy for large rank lists.\n",
    "\n",
    "    :param rank_lists: A list of rank lists. Each rank list should be a list of (item, rank) tuples.\n",
    "    :param alpha: The parameter alpha used in the RRF formula. Default is 60.\n",
    "    :param default_rank: The default rank assigned to items not present in a rank list. Default is 1000.\n",
    "    :return: Sorted list of items based on their RRF scores.\n",
    "    \"\"\"\n",
    "    # Consolidate all unique items from all rank lists\n",
    "    all_items = set(item for rank_list in rank_lists for item, _ in rank_list)\n",
    "\n",
    "    # Create a mapping of items to indices\n",
    "    item_to_index = {item: idx for idx, item in enumerate(all_items)}\n",
    "\n",
    "    # Initialize a matrix to hold the ranks, filled with the default rank\n",
    "    rank_matrix = np.full((len(all_items), len(rank_lists)), default_rank)\n",
    "\n",
    "    # Fill in the actual ranks from the rank lists\n",
    "    for list_idx, rank_list in enumerate(rank_lists):\n",
    "        for item, rank in rank_list:\n",
    "            rank_matrix[item_to_index[item], list_idx] = rank\n",
    "\n",
    "    # Calculate RRF scores using NumPy operations\n",
    "    rrf_scores = np.sum(1.0 / (alpha + rank_matrix), axis=1)\n",
    "\n",
    "    # Sort items based on RRF scores\n",
    "    sorted_indices = np.argsort(-rrf_scores)  # Negative for descending order\n",
    "\n",
    "    # Retrieve sorted items\n",
    "    sorted_items = [(list(item_to_index.keys())[idx], rrf_scores[idx]) for idx in sorted_indices]\n",
    "\n",
    "    return sorted_items\n",
    "\n",
    "\n",
    "# Example usage\n",
    "rank_list1 = [(\"A\", 1), (\"B\", 2), (\"C\", 3)]\n",
    "rank_list2 = [(\"B\", 1), (\"C\", 2), (\"D\", 3)]\n",
    "rank_list3 = [(\"A\", 2), (\"D\", 1), (\"E\", 3)]\n",
    "\n",
    "# Combine the rank lists\n",
    "sorted_items = rrf([rank_list1, rank_list2, rank_list3])\n",
    "sorted_items"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Based on this, let's convert our sparse and dense results into rank lists. And then, we'll use the Reciprocal Rank Fusion (RRF) algorithm to combine them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "def rank_list(search_result: list[ScoredPoint]):\n",
    "    return [(point.id, rank + 1) for rank, point in enumerate(search_result)]\n",
    "\n",
    "\n",
    "dense_rank_list, sparse_rank_list = rank_list(search_results[0]), rank_list(search_results[1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
    "rrf_rank_list = rrf([dense_rank_list, sparse_rank_list])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[(3, 0.032018442622950824),\n",
       " (8, 0.03149801587301587),\n",
       " (1, 0.03131881575727918),\n",
       " (13, 0.030834914611005692),\n",
       " (15, 0.030536130536130537),\n",
       " (9, 0.030309988518943745),\n",
       " (12, 0.030158730158730156),\n",
       " (14, 0.029437229437229435),\n",
       " (11, 0.028985507246376812),\n",
       " (2, 0.01707242848447961),\n",
       " (4, 0.01564927857935627)]"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "rrf_rank_list"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[Record(id=3, payload={'text': 'Delta Electronics RAD80L BreezRadiance 80 CFM Heater/Fan/Light Combo White (Renewed)\\nDelta Electronics RAD80L BreezRadiance 80 CFM Heater/Fan/Light Combo White (Renewed)\\nDELTA ELECTRONICS (AMERICAS) LTD.\\nWhite\\nThis pre-owned or refurbished product has been professionally inspected and tested to work and look like new. How a product becomes part of Amazon Renewed, your destination for pre-owned, refurbished products: A customer buys a new product and returns it or trades it in for a newer or different model. That product is inspected and tested to work and look like new by Amazon-qualified suppliers. Then, the product is sold as an Amazon Renewed product on Amazon. If not satisfied with the purchase, renewed products are eligible for replacement or refund under the Amazon Renewed Guarantee.\\nQuiet operation at 1.5 sones\\nBuilt-in thermostat regulates temperature. Energy efficiency at 7.6 CFM/Watt\\nPrecision engineered with DC brushless motor for extended reliability, this fan will outlast many household appliances\\nGalvanized steel construction resists corrosion\\nDuct: Detachable 4-inch Plastic Duct Adapter\\nQuiet operation at 1.5 sones\\nBuilt-in thermostat regulates temperature. Energy efficiency at 7.6 CFM/Watt\\nPrecision engineered with DC brushless motor for extended reliability, this fan will outlast many household appliances\\nGalvanized steel construction resists corrosion\\nDuct: Detachable 4-inch Plastic Duct Adapter', 'product_id': 'B07RH6Z8KW'}, vector=None, shard_key=None),\n",
       " Record(id=8, payload={'text': 'Aero Pure ABF80 L5 W ABF80L5 Ceiling Mount 80 CFM w/LED Light/Nightlight, Energy Star Certified, White Quiet Bathroom Ventilation Fan\\nAero Pure ABF80 L5 W ABF80L5 Ceiling Mount 80 CFM w/LED Light/Nightlight, Energy Star Certified, White Quiet Bathroom Ventilation Fan\\nAero Pure\\nWhite\\nNone\\nQuiet 0.3 Sones, 80 CFM fan with choice of three designer grilles in White, Satin Nickel, or Oil Rubbed Bronze; Full 6 year warranty\\n10W 3000K 800 Lumens LED Light with 0.7W Nightlight included\\nInstallation friendly- Quick-mount adjustable metal bracket for new construction and retrofit; 4”, 5: and 6” metal duct adaptor included\\nMeets today’s demanding building specifications- ETL Listed for wet application, ENERGY STAR certified, CALGreen, JA-8 Compliant for CA Title 24, and ASHRAE 62.2 compliant\\nHousing dimensions- 10 2/5”x10 2/5”x 7 ½”; Grille dimensions- 13”x13”; Fits 2\"x8\" joists\\nQuiet 0.3 Sones, 80 CFM fan with choice of three designer grilles in White, Satin Nickel, or Oil Rubbed Bronze; Full 6 year warranty\\n10W 3000K 800 Lumens LED Light with 0.7W Nightlight included\\nInstallation friendly- Quick-mount adjustable metal bracket for new construction and retrofit; 4”, 5: and 6” metal duct adaptor included\\nMeets today’s demanding building specifications- ETL Listed for wet application, ENERGY STAR certified, CALGreen, JA-8 Compliant for CA Title 24, and ASHRAE 62.2 compliant\\nHousing dimensions- 10 2/5”x10 2/5”x 7 ½”; Grille dimensions- 13”x13”; Fits 2\"x8\" joists', 'product_id': 'B07JY1PQNT'}, vector=None, shard_key=None),\n",
       " Record(id=1, payload={'text': \"Homewerks 7141-80 Bathroom Fan Integrated LED Light Ceiling Mount Exhaust Ventilation, 1.1 Sones, 80 CFM\\nHomewerks 7141-80 Bathroom Fan Integrated LED Light Ceiling Mount Exhaust Ventilation, 1.1 Sones, 80 CFM\\nHomewerks\\n80 CFM\\nNone\\nOUTSTANDING PERFORMANCE: This Homewerk's bath fan ensures comfort in your home by quietly eliminating moisture and humidity in the bathroom. This exhaust fan is 1.1 sones at 80 CFM which means it’s able to manage spaces up to 80 square feet and is very quiet..\\nBATH FANS HELPS REMOVE HARSH ODOR: When cleaning the bathroom or toilet, harsh chemicals are used and they can leave an obnoxious odor behind. Homewerk’s bathroom fans can help remove this odor with its powerful ventilation\\nBUILD QUALITY: Designed to be corrosion resistant with its galvanized steel construction featuring a modern style round shape and has an 4000K Cool White Light LED Light. AC motor.\\nEASY INSTALLATION: This exhaust bath fan is easy to install with its no-cut design and ceiling mount ventilation. Ceiling Opening (L) 7-1/2 in x Ceiling Opening (W) 7-1/4 x Ceiling Opening (H) 5-3/4 in. 13 in round grill and 4 in round duct connector.\\nHOMEWERKS TRUSTED QUALITY: Be confident in the quality and construction of each and every one of our products. We ensure that all of our products are produced and certified to regional, national and international industry standards. We are proud of the products we sell, you will be too. 3 Year Limited\\nOUTSTANDING PERFORMANCE: This Homewerk's bath fan ensures comfort in your home by quietly eliminating moisture and humidity in the bathroom. This exhaust fan is 1.1 sones at 80 CFM which means it’s able to manage spaces up to 80 square feet and is very quiet..\\nBATH FANS HELPS REMOVE HARSH ODOR: When cleaning the bathroom or toilet, harsh chemicals are used and they can leave an obnoxious odor behind. Homewerk’s bathroom fans can help remove this odor with its powerful ventilation\\nBUILD QUALITY: Designed to be corrosion resistant with its galvanized steel construction featuring a modern style round shape and has an 4000K Cool White Light LED Light. AC motor.\\nEASY INSTALLATION: This exhaust bath fan is easy to install with its no-cut design and ceiling mount ventilation. Ceiling Opening (L) 7-1/2 in x Ceiling Opening (W) 7-1/4 x Ceiling Opening (H) 5-3/4 in. 13 in round grill and 4 in round duct connector.\\nHOMEWERKS TRUSTED QUALITY: Be confident in the quality and construction of each and every one of our products. We ensure that all of our products are produced and certified to regional, national and international industry standards. We are proud of the products we sell, you will be too. 3 Year Limited\", 'product_id': 'B07X3Y6B1V'}, vector=None, shard_key=None),\n",
       " Record(id=13, payload={'text': 'Delta BreezSignature VFB25ACH 80 CFM Exhaust Bath Fan with Humidity Sensor\\nDelta BreezSignature VFB25ACH 80 CFM Exhaust Bath Fan with Humidity Sensor\\nDELTA ELECTRONICS (AMERICAS) LTD.\\nWhite\\nNone\\nVirtually silent at less than 0.3 sones\\nPrecision engineered with DC brushless motor for extended reliability\\nEasily switch in and out of humidity sensing mode by toggling wall switch\\nENERGY STAR qualified for efficient cost-saving operation\\nPrecision engineered with DC brushless motor for extended reliability, this fan will outlast many household appliances\\nVirtually silent at less than 0.3 sones\\nPrecision engineered with DC brushless motor for extended reliability\\nEasily switch in and out of humidity sensing mode by toggling wall switch\\nENERGY STAR qualified for efficient cost-saving operation\\nPrecision engineered with DC brushless motor for extended reliability, this fan will outlast many household appliances', 'product_id': 'B003O0MNGC'}, vector=None, shard_key=None),\n",
       " Record(id=15, payload={'text': 'Delta Electronics (Americas) Ltd. GBR80HLED Delta BreezGreenBuilder Series 80 CFM Fan/Dimmable H, LED Light, Dual Speed & Humidity Sensor\\nDelta Electronics (Americas) Ltd. GBR80HLED Delta BreezGreenBuilder Series 80 CFM Fan/Dimmable H, LED Light, Dual Speed & Humidity Sensor\\nDELTA ELECTRONICS (AMERICAS) LTD.\\nWith LED Light, Dual Speed & Humidity Sensor\\nNone\\nUltra energy-efficient LED module (11-watt equivalent to 60-watt incandescent light) included. Main light output-850 Lumens, 3000K\\nExtracts air at a rate of 80 CFM to properly ventilate bathrooms up to 80 sq. Ft., quiet operation at 0.8 sones\\nPrecision engineered with DC brushless motor for extended reliability, this Fan will outlast many household appliances\\nEnergy Star qualified for efficient cost-saving operation, galvanized steel construction resists corrosion\\nFan impeller Stops If obstructed, for safe worry-free operation, attractive grille gives your bathroom a fresh look\\nUltra energy-efficient LED module (11-watt equivalent to 60-watt incandescent light) included. Main light output-850 Lumens, 3000K\\nExtracts air at a rate of 80 CFM to properly ventilate bathrooms up to 80 sq. Ft., quiet operation at 0.8 sones\\nPrecision engineered with DC brushless motor for extended reliability, this Fan will outlast many household appliances\\nEnergy Star qualified for efficient cost-saving operation, galvanized steel construction resists corrosion\\nFan impeller Stops If obstructed, for safe worry-free operation, attractive grille gives your bathroom a fresh look', 'product_id': 'B01N5Y6002'}, vector=None, shard_key=None),\n",
       " Record(id=9, payload={'text': \"Delta Electronics (Americas) Ltd. RAD80 Delta BreezRadiance Series 80 CFM Fan with Heater, 10.5W, 1.5 Sones\\nDelta Electronics (Americas) Ltd. RAD80 Delta BreezRadiance Series 80 CFM Fan with Heater, 10.5W, 1.5 Sones\\nDELTA ELECTRONICS (AMERICAS) LTD.\\nWith Heater\\nNone\\nQuiet operation at 1.5 Sones\\nPrecision engineered with DC brushless motor for extended reliability, this Fan will outlast many household appliances\\nGalvanized steel construction resists corrosion, equipped with metal duct adapter\\nFan impeller Stops If obstructed, for safe worry-free operation\\nPeace of mind quality, performance and reliability from the world's largest DC brushless Fan Manufacturer\\nQuiet operation at 1.5 Sones\\nPrecision engineered with DC brushless motor for extended reliability, this Fan will outlast many household appliances\\nGalvanized steel construction resists corrosion, equipped with metal duct adapter\\nFan impeller Stops If obstructed, for safe worry-free operation\\nPeace of mind quality, performance and reliability from the world's largest DC brushless Fan Manufacturer\", 'product_id': 'B01MZIK0PI'}, vector=None, shard_key=None),\n",
       " Record(id=12, payload={'text': 'Aero Pure AP80RVLW Super Quiet 80 CFM Recessed Fan/Light Bathroom Ventilation Fan with White Trim Ring\\nAero Pure AP80RVLW Super Quiet 80 CFM Recessed Fan/Light Bathroom Ventilation Fan with White Trim Ring\\nAero Pure\\nWhite\\nNone\\nSuper quiet 80CFM energy efficient fan virtually disappears into the ceiling leaving only a recessed light in view\\nMay be installed over shower when wired to a GFCI breaker and used with a PAR30L 75W (max) CFL\\nBulb not included. Accepts any of the following bulbs: 75W Max. PAR30, 14W Max. BR30 LED, or 75W Max. PAR30L (for use over tub/shower.)\\nSuper quiet 80CFM energy efficient fan virtually disappears into the ceiling leaving only a recessed light in view\\nMay be installed over shower when wired to a GFCI breaker and used with a PAR30L 75W (max) CFL\\nBulb not included. Accepts any of the following bulbs: 75W Max. PAR30, 14W Max. BR30 LED, or 75W Max. PAR30L (for use over tub/shower.)', 'product_id': 'B00MARNO5Y'}, vector=None, shard_key=None),\n",
       " Record(id=14, payload={'text': 'Broan Very Quiet Ceiling Bathroom Exhaust Fan, ENERGY STAR Certified, 0.3 Sones, 80 CFM\\nBroan Very Quiet Ceiling Bathroom Exhaust Fan, ENERGY STAR Certified, 0.3 Sones, 80 CFM\\nBroan-NuTone\\nWhite\\nNone\\nHIGH-QUALITY FAN: Very quiet, energy efficient exhaust fan runs on 0. 3 Sones and is motor engineered for continuous operation\\nEFFICIENT: Operates at 80 CFM in bathrooms up to 75 sq. ft. for a high-quality performance. Dimmable Capability: Non Dimmable\\nEASY INSTALLATION: Fan is easy to install and/or replace existing product for DIY\\'ers and needs only 2\" x 8\" construction space. Can be used over bathtubs or showers when connected to a GFCI protected branch circuit\\nFEATURES: Includes hanger bar system for fast, flexible installation for all types of construction and a 6\" ducting for superior performance\\nCERTIFIED: ENERGY STAR qualified and HVI Certified to ensure the best quality for your home\\nHIGH-QUALITY FAN: Very quiet, energy efficient exhaust fan runs on 0. 3 Sones and is motor engineered for continuous operation\\nEFFICIENT: Operates at 80 CFM in bathrooms up to 75 sq. ft. for a high-quality performance. Dimmable Capability: Non Dimmable\\nEASY INSTALLATION: Fan is easy to install and/or replace existing product for DIY\\'ers and needs only 2\" x 8\" construction space. Can be used over bathtubs or showers when connected to a GFCI protected branch circuit\\nFEATURES: Includes hanger bar system for fast, flexible installation for all types of construction and a 6\" ducting for superior performance\\nCERTIFIED: ENERGY STAR qualified and HVI Certified to ensure the best quality for your home', 'product_id': 'B001E6DMKY'}, vector=None, shard_key=None),\n",
       " Record(id=11, payload={'text': 'Panasonic FV-0811VF5 WhisperFit EZ Retrofit Ventilation Fan, 80 or 110 CFM\\nPanasonic FV-0811VF5 WhisperFit EZ Retrofit Ventilation Fan, 80 or 110 CFM\\nPanasonic\\nWhite\\nNone\\nRetrofit Solution: Ideal for residential remodeling, hotel construction or renovations\\nLow Profile: 5-5/8-Inch housing depth fits in a 2 x 6 construction\\nPick-A-Flow Speed Selector: Allows you to pick desired airflow from 80 or 110 CFM\\nFlexible Installation: Comes with Flex-Z Fast bracket for easy, fast and trouble-free installation\\nEnergy Star Rated: Delivers powerful airflow without wasting energy\\nRetrofit Solution: Ideal for residential remodeling, hotel construction or renovations\\nLow Profile: 5-5/8-Inch housing depth fits in a 2 x 6 construction\\nPick-A-Flow Speed Selector: Allows you to pick desired airflow from 80 or 110 CFM\\nFlexible Installation: Comes with Flex-Z Fast bracket for easy, fast and trouble-free installation\\nEnergy Star Rated: Delivers powerful airflow without wasting energy', 'product_id': 'B00XBZFWWM'}, vector=None, shard_key=None),\n",
       " Record(id=2, payload={'text': 'Homewerks 7140-80 Bathroom Fan Ceiling Mount Exhaust Ventilation, 1.5 Sones, 80 CFM, White\\nHomewerks 7140-80 Bathroom Fan Ceiling Mount Exhaust Ventilation, 1.5 Sones, 80 CFM, White\\nHomewerks\\nWhite\\nNone\\nOUTSTANDING PERFORMANCE: This Homewerk\\'s bath fan ensures comfort in your home by quietly eliminating moisture and humidity in the bathroom. This exhaust fan is 1. 5 sone at 110 CFM which means it’s able to manage spaces up to 110 square feet\\nBATH FANS HELPS REMOVE HARSH ODOR: When cleaning the bathroom or toilet, harsh chemicals are used and they can leave an obnoxious odor behind. Homewerk’s bathroom fans can help remove this odor with its powerful ventilation\\nBUILD QUALITY: Designed to be corrosion resistant with its galvanized steel construction featuring a grille modern style.\\nEASY INSTALLATION: This exhaust bath fan is easy to install with its no-cut design and ceiling mount ventilation. Ceiling Opening (L) 7-1/2 in x Ceiling Opening (W) 7-1/4 x Ceiling Opening (H) 5-3/4 in and a 4\" round duct connector.\\nHOMEWERKS TRUSTED QUALITY: Be confident in the quality and construction of each and every one of our products. We ensure that all of our products are produced and certified to regional, national and international industry standards. We are proud of the products we sell, you will be too. 3 Year Limited\\nOUTSTANDING PERFORMANCE: This Homewerk\\'s bath fan ensures comfort in your home by quietly eliminating moisture and humidity in the bathroom. This exhaust fan is 1. 5 sone at 110 CFM which means it’s able to manage spaces up to 110 square feet\\nBATH FANS HELPS REMOVE HARSH ODOR: When cleaning the bathroom or toilet, harsh chemicals are used and they can leave an obnoxious odor behind. Homewerk’s bathroom fans can help remove this odor with its powerful ventilation\\nBUILD QUALITY: Designed to be corrosion resistant with its galvanized steel construction featuring a grille modern style.\\nEASY INSTALLATION: This exhaust bath fan is easy to install with its no-cut design and ceiling mount ventilation. Ceiling Opening (L) 7-1/2 in x Ceiling Opening (W) 7-1/4 x Ceiling Opening (H) 5-3/4 in and a 4\" round duct connector.\\nHOMEWERKS TRUSTED QUALITY: Be confident in the quality and construction of each and every one of our products. We ensure that all of our products are produced and certified to regional, national and international industry standards. We are proud of the products we sell, you will be too. 3 Year Limited', 'product_id': 'B07WDM7MQQ'}, vector=None, shard_key=None),\n",
       " Record(id=4, payload={'text': 'Panasonic FV-08VRE2 Ventilation Fan with Recessed LED (Renewed)\\nPanasonic FV-08VRE2 Ventilation Fan with Recessed LED (Renewed)\\nPanasonic\\nWhite\\nNone\\nThe design solution for Fan/light combinations\\nEnergy Star rated architectural grade recessed Fan/LED light\\nQuiet, energy efficient and powerful 80 CFM ventilation hidden above the Ceiling\\nLED lamp is dimmable\\nBeautiful Lighting with 6-1/2”aperture and advanced luminaire design\\nThe design solution for Fan/light combinations\\nEnergy Star rated architectural grade recessed Fan/LED light\\nQuiet, energy efficient and powerful 80 CFM ventilation hidden above the Ceiling\\nLED lamp is dimmable\\nBeautiful Lighting with 6-1/2”aperture and advanced luminaire design', 'product_id': 'B07QJ7WYFQ'}, vector=None, shard_key=None)]"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def find_point_by_id(\n",
    "    client: QdrantClient, collection_name: str, rrf_rank_list: list[tuple[int, float]]\n",
    "):\n",
    "    return client.retrieve(\n",
    "        collection_name=collection_name, ids=[item[0] for item in rrf_rank_list]\n",
    "    )\n",
    "\n",
    "\n",
    "find_point_by_id(client, collection_name, rrf_rank_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, let's check the ESCI (Exact, Substitute, Compliment, and Irrelvant) label for the results against the source data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Exact\n",
      "Exact\n",
      "Exact\n",
      "Exact\n",
      "Exact\n",
      "Exact\n",
      "Exact\n",
      "Exact\n",
      "Exact\n",
      "Exact\n",
      "Exact\n"
     ]
    }
   ],
   "source": [
    "ids = [item[0] for item in rrf_rank_list]\n",
    "df[df[\"query\"] == query_text]\n",
    "\n",
    "for idx in ids:\n",
    "    print(df.iloc[idx][\"esci_label\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This was amazing! We pulled only Exact results with k=10. This is a great result for a small dataset like this with out of the box vectors which are not even fine-tuned for e-Commerce."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "11"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(rrf_rank_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conclusion\n",
    "\n",
    "In this notebook, we demonstrated the usage of Hybrid Search with FastEmbed & Qdrant. We used FastEmbed to create Sparse and Dense embeddings for the data and indexed them using Qdrant. We then performed Hybrid Search using FastEmbed & Qdrant and ranked the search results using Reciprocal Rank Fusion (RRF)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
